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<N ; Abstract 

Qh' Differential privacy is a recent notion of privacy for statistical databases tliat provides rigorous, 

^ , meaningful confidentiality guarantees, even in the presence of an attacker with access to arbitrary 

side information. 

' We show that for a large class of parametric probability models, one can construct a differen- 

tially private estimator whose distribution converges to that of the maximum likelihood estimator. 
p/j I In particular, it is efficient and asymptotically unbiased. This result provides (further) compelUng 

^ ■ evidence that rigorous notions of privacy in statistical databases can be consistent with statistically 

• . valid inference. 
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1 Introduction 

Privacy is a fundamental problem in modem data analysis. Increasing volumes of personal and sensitive 
C — data are collected by government and other organizations. The potential social benefits of analyzing 

o 



> 



these databases are significant; at the same time, releasing information from repositories of sensitive 
data can cause devastating damage to privacy. The challenge is to discover and release global char- 
56 acteristics of these databases, without compromising the privacy of the individuals whose data they 
O ' contain. 

. ^ ' There is a vast body of work on this problem in statistics and computer science. However, until 
^ . recently, most schemes proposed in the literature lacked rigorous analysis of privacy and utility. Few 
works even formulated a precise definition of their schemes' conjectured properties. 

In this paper, we explore the potential of dijferential privacy, a definition of privacy due to Dwork 
et al. [16] that emerged from a line of work in cryptography [11, 18, 4]. This notion of privacy makes 
assumptions neither about what kind of attack might be perpetrated based on the released statistics, 
nor about what additional information the attacker might possess. It resolves a number of problems 
present in previous attempts at a definition. In particular, it provides precise guarantees in the presence 
of arbitrary side information available to the adversary but unknown to the organization that is releasing 
information. 

Specifically, we show that for well-behaved parametric probability models, one can construct a 
differentially-private estimator whose distribution converges to that of the MLE. In particular, it is 
efficient and asymptotically unbiased. This provides (further) strong evidence that rigorous notions of 
database privacy can be consistent with statistically valid inference. 
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Differential Privacy The problem of identifying which information in the database is safe to re- 
lease has generated a vast body of work, both in statistics and computer science. Until recently, there 
were two nearly disjoint fields studying the data privacy problem: "statistical disclosure limitation" 
(also known as "data confidentiality"), initiated by the statistics community in 1960s, and "privacy- 
preserving data mining", active in the database community during the 1980's and rekindled at the turn 
of the 21st century by researchers in data mining. The literature in both fields is far too vast to survey 
here. For some pointers to the broader literature in statistics, see [33, 8, 9, 10, 31, 21]. For early work 
in computer science, see the survey in [1]. Recent work in data mining was started by [2] and led to an 
explosion of literature. For (partial) references, see [6, 23, 32]. 

However, the schemes proposed in these fields lack rigorous analysis of privacy. Typically, the 
schemes have either no formal privacy guarantees or ensure security only against a specific suite of 
attacks. This leaves them potentially vulnerable to unforeseen attacks, and makes it difficult to compare 
different schemes because each of them is basically solving a different problem. 

A recent line of work [11, 20, 18, 4, 16, 15, 12, 29, 17, 27, 3, 30, 5, 19, 25, 24], called private data 
analysis, seeks to place data privacy on more firm theoretical foundations and has been successful at 
formulating a strong, yet attainable privacy definition. The intuition behind the definition, which is due 
to Dwork et al. [16], is that whether an individual supplies her actual or fake information has almost 
no effect on the outcome of the analysis. Roughly, a randomized algorithm that takes sensitive data as 
input and outputs a product for publication is considered privacy-preserving if databases that differ in 
one entry induce nearby distributions on its outcomes (see below for a precise definition). 

A number of techniques for designing differentially private algorithms are now known. These are 
surveyed by Dwork [13, 14] and Nissim [28]. 

Our Contribution This paper provides a qualitatively different result from previous work, in that it 
relates the perturbation added for differential privacy to the provably optimal error of point estimators. 
For a broad class of problems, we show that differential privacy can be provided at an asymptotically 
vanishing cost to accuracy. 

Specifically, we show a modification to the maximum likelihood estimator for parametric models 
which satisfies differential privacy and is asymptotically efficient, meaning that the averaged squared 
error of the estimator is (1 + o(l)) / (nl (9)) , where n is the number of samples in the input, 1(9) 
denotes the Fisher information of / at 6^ and and o(l) denotes a function that tends to zero as n tends to 
infinity. Differential privacy is quantified by a parameter e > which measures information leakage; 
our estimator satisfies differential privacy with lim„^oo e = 0. 

2 Definitions 

Consider a parameter estimation problem defined by a model /(x; 9) where 6^ is a real- valued vector in 
a bounded space C of diameter A, and x takes values in a D (typically, either a real vector space 
or a finite, discrete set). 

We will generally use the following notational convention: capital latin letters (X, T, etc) refer to 
random variables or processes. Their lower case analogues refer to fixed, deterministic values of these 
random objects (i.e. scalars, vectors, or functions). 

Given i.i.d. random variables X = (Xi, ...,Xn) drawn according to the distribution /(■; 9), we 
would like to estimate 9 using an estimator t that takes as input the data x as well an additional. 
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independent source of randomness R (used, in our case, for perturbation): 



e 



X 



t{X,R) = T{X) 
T 

R 



Even for a fixed input x = {xi, x„) G D"-, the estimator T(x) = t{x, R) is a random variable 
distributed in the parameter space M^. For example, it might consist of a deterministic function value 
that is perturbed using additive random noise, or it might consist of a sample from a posterior distribu- 
tion constructed based on x. We will use the capital letter X to denote the random variable, and lower 
case X to denote a specific value in D". Thus, the random variable T{X) is generated from two sources 
of randomness: the samples X and the random bits used by T. 

Differential Privacy We say two fixed data sets x and x' in are neighbors if x and x' agree in all 
but one position, that is for some i. 



Differential privacy compares the distributions of T{x) = t{x, R) and T(x') = t{x', R) corre- 
sponding to neighboring data sets x, x'. It requires that for all possible pairs of neighboring data sets, 
the corresponding distributions be close: 

Definition 2.1. A randomized algorithm T(-) is e-differentially private if for all neighboring pairs of 
databases x and x', and for all measurable subsets of outputs (events) S: 



This condition states that on single point in the input set can significantly influence the distribution 
of the estimator. Note that the privacy condition makes no reference to a distribution on x. It is a 
"worst-case" notion of privacy that provides a guarantee even when our modeling of the distribution on 
X is incorrect. 

Given two probability measures p and g on a space Vt, we can define the multiplicative distance 
between p and q to be 



(We say (p, q) = oo when the supremum above doesn't exist.) Thus, e-differential privacy requires 
that, for all fixed neighboring data sets x and x' , the multiplicative distance between (the distributions 
of) T{x) and T(x') be at most e. The exact choice of distance function significantly affects the practical 
meaning of differential privacy — see Section 4, Remark 2 in [16] and [25] for discussion. 

The MLE and Efficiency Many methods exist to measure the quality of a point estimator. In this 
paper, we consider the expected squared deviation from the real parameter 6. For a one-dimensional 
parameter {p = 1), this can be written: 




Pr(T(x) eS)<e'x Pr(r(x ) G S) . 




Me) = Ee ((T(x) - ey) 
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The notation Eg (...) refers to the fact that X is drawn i.i.d. according to /(■; 9). If T{X) is unbiased, 
then Jrid) is simply the variance Var^ (T(X)). Note that all these notations are equally well-defined 
for a randomized estimator T(x) = t{x, R). The expectation is then also taken over the choice of R, 
i.e.JT{e) = EemX,R)-ef) . 

(Mean squared error can be defined analogously for higher-dimensional parameter vectors. For sim- 
plicity we focus here on the one-dimensional case. The development of a higher-dimensional analogue 
is identical, as long as p is constant with respect to n. ) 

The maximum likelihood estimator 9mle{x) returns a value 9 that maximizes the likelihood function 
L(9) = Yli fixi] 9), if such a maximum exists. It is a classic result that, for well-behaved parametric 
families, the 6'mle exists with high probability and is asymptotically normal, centered around the true 
value 9. Moreover, its expected square error is given by the inverse of Fisher information at 9, 

^/W = E.([iln(/(Xi;^))] = 

Lemma 2.1. Under appropriate regularity conditions, the MLE converges in distribution to a Gaussian 
centered at 9, that is \fn - (^mle — 6*) N ^0, • Moreover, J§^^^{9) = l^i°[g] ' where o(l) denotes 

a function ofn that tends to zero as n tends to infinity. 

The MLE has optimal among unbiased estimators; estimators that match this bound are called 
efficient. 

Definition 2.2. An estimator T is asymptotically efficient /or a model /{■;■) if, for all 9, the expected 

squared error converges to that of the MLE, that is, for all 9 E @, Jt{9) < \ ■ 

nlf{9) 



Bias Correction The asymptotic efficiency of the MLE implies that its bias, feMLE(6') = Eg {^9 

goes to zero more quickly than 1 / ^yn. However, in our main result, we will need an estimator with 
much lower bias. This can be obtained via a (standard) process known as bias correction. 

Under appropriate regularity assumptions, we can describe the bias of MLE precisely, namely 



'MLE 



3/2 



v / n \n 

where 61(6*) has a uniformly bounded derivative (see, for example, discussions in Cox and Hinkley [7], 
Firth [22], and Li [26]). Several methods exist for correcting this bias. The simplest is to subtract off 
an estimate of the leading term, using &i(6'mle) to estimate bi{9); the result is called the bias-corrected 
MLE, 

9bc = 9mle — bi{9yiLE) / n . 



Lemma 2.2. The bias-corrected MLE 9bc = 9mle — i>i (^mle) / "n, converges at the same rate as the MLE 
but with lower bias, namely, 

■ -9)^ iV(0, j^) and he = Eg -9^= 0(^-3/2) . 
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3 A Private, Efficient Estimator 



We can now state our main result: 

Theorem 3.1. Under appropriate regularity conditions, there exists a (randomized) estimator T which 
is asymptotically efficient and e- differentially private, where lim.„^oo e = 0. 

More precisely, the construction takes as input the parameter e and produces an estimator T with 
mean squared error ^^^^^^ (1 + 0(n~^/^e~^/^)). Thus, as long as e goes to more slowly than n~^/^, the 
estimator will be asymptotically efficient. 

The idea is to apply the "sample-and-aggregate" method of [29], similar in spirit to the parametric 
bootstrap. The procedure is quite general and can be instantiated in several variants. We present a 
particular version which is sufficient to prove our main theorem. 

The estimator T* takes the data x as well as a parameter e > (which measures information 
leakage) and a positive integer k (to be determined later). The idea is to break the input into k blocks 
oin/k points each, compute the (bias-corrected) MLE on each block, and release the average of these 
estimates plus some small additive perturbation. The procedure is given in Algorithm 1 and illustrated 
in Figure 1 . 

Algorithm 1 On input x = (xi, x„) G -D", e > and /c e N: 
1: Arbitrarily divide the input x into k disjoint sets 5^ of t = ^ points. 

We call these k sets the blocks of the input. 
2: for each block 5j = Xjt}, do 

3; Apply the bias corrected MLE 6bc to obtain an estimate zj = ^fec(a;(j-i)t+i, ...,Xjt). 
4: end for 

5: Compute the average estimate: z = ^ X]j=i ^j- 

6: Draw a random observation R from a double-exponential (Laplace) distribution with standard de- 
viation \/2 ■ A/{ke), that is, draw Y ~ Lap (^) where Lap(A) is the distribution on M with density 
h{y) = ^e^/'^. (Recall that A is the diameter of the parameter space ©.) 

7: Output T* = z + R. 



The resulting estimator has the form 

k 



Lemma 3.2 ([4, 29]). For any choice of the number of blocks k, the estimator T* is e- differentially 
private. 

The lemma follows from the general techniques developed in [16, 29]; we include a direct proof 
here for completeness. 

Proof. Fix a particular value of x, and consider the effect of changing a single entry Xi to obtain a 
database x' (for any particular index i). At most one of the numbers Zj can change, depending on 
the block which contains Xi. The number Zj that changes can go up or down by at most A, since the 
parameter takes values in [0, A]. This means that the mean z can change by at most A//c. 



5 




noise 



i?~Lap(A) ^ 



(+) — - output T* = z + R 



Figure 1: The estimator T*. When the number of bins k is o{n'^^^) and e is not too small, T* is 
asymptotically efficient (Lemma 3.3). 



The random variables T*(x) and T*{x') are thus Laplace random variables with identical standard 
deviations and means differing by at most A/k. Let and h^' be the corresponding density functions. 
As in [4, 16], observe that the ratio of their densities is at most since for any real number y: 



hx{y) 



exp 


' tk 
^ A 


\y- 


-A) 


exp( 


tk 
A 


y - 


-z'\) 



— <exp(f l^-z'l) <exp(e). 



Similarly, the ratio is bounded below by exp(— e). For any measurable set 5 C M with non-zero 
measure, the ratio Pr(T*(x) G S*)/ Pr(r*(x') G S) is thus between and e^ This is exactly the 
requirement of differential privacy. □ 

Lemma 3.3. Under the regularity conditions of Lemma 2.2, if e = and k is set appropriately, 

the estimator T* is asymptotically unbiased, normal and efficient, that is 

v^-T*(X) ^ iV(^, TTTJV) when X = Xi, ...,Xn f{-,e) are i.i.d. 

Proof. We will select /c as a function of n and e. For now, assume that i = f goes to infinity with n. 
By Lemma 2.1, each Zi = ^mle •••,^it) converges to normal, and moreover the bias and 

variance of Zi can be bounded: 



Ee {Zi) = 6 ± and Var^ ■ z}j 



Consider the averaged estimator Z = ^ J2i ^i- Its expectation is equal to the expectations of the 
Zi, while its variance scales with k: 

Ee {Z)=Ee{Z,) = e±0[^y^^ 

/-N 1 , . Ik I + 0(1) l+o(l) 
Vare [Z) = T Vaig [Z, 



k " k n If{e) nlf{e) 
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Recall that the mean squared error J{6) is the sum of the variance and squared bias of an estimator. 
Since the squared bias is 0{k^/n^), it vanishes asymptotically compared to the variance as long as 
k = o{n'^/^), that is, as long as k/n^/-^ 0. 

Thus, for sufficiently small k, the estimator Z is efficient. We now consider for which values of k 
the added noise is small enough so that it does not affect the efficiency of T*. The noise added to Z 
to get T* does not contribute to the bias of the estimator, but does add to the variance. Specifically: 
E, (r*(X)) = Ee{Z)=9±0 {^Y^' and 



Vare {T*{X))=VaTe [Z) 



A2 1 /l + o(l) nA^ 



k^e^ n \ If{e) Fe2 
1 fl + o(l) nA^ k' 



n V lf{0) Fe^ n\ 

If e = then we can choose k to ensure that nJT*{0) 1/If{9). We need k = o{n^^^) 

to get sufficiently small bias and k = uj{^) to get the variance of the noise sufficiently low. Taking 
k = yields an asymptotic relative error that tends to 1, namely: 



n 



If [6) \ny^e^/^ 



Since A is constant with respect to n, T* is efficient as long as en^^^ oo, as desired. □ 
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